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ABSTRACT 

The recent expansion in our knowledge of protein- 
protein interactions (PPIs) has allowed the annota- 
tion and prediction of hundreds of thousands of 
interactions. However, the function of many of 
these interactions remains elusive. The interactions 
of Eukaryotic Linear Motif (iELM) web server 
provides a resource for predicting the function and 
positional interface for a subset of interactions 
mediated by short linear motifs (SLiMs). The iELM 
prediction algorithm is based on the annotated 
SLiM classes from the Eukaryotic Linear Motif 
(ELM) resource and allows users to explore both 
annotated and user-generated PPI networks for 
SLiM-mediated interactions. By incorporating the 
annotated information from the ELM resource, 
iELM provides functional details of PPIs. This can 
be used in proteomic analysis, for example, to 
infer whether an interaction promotes complex for- 
mation or degradation. Furthermore, details of the 
molecular interface of the SLiM-mediated inter- 
actions are also predicted. This information is dis- 
played in a fully searchable table, as well as 
graphically with the modular architecture of the 
participating proteins extracted from the UniProt 
and Phospho.ELM resources. A network figure is 
also presented to aid the interpretation of results. 
The iELM server supports single protein queries as 
well as large-scale proteomic submissions and is 
freely available at http://i. elm.eu.org. 

INTRODUCTION 

The interactions of Eukaryotic Linear Motif (iELM) web 
server facilitates the exploration of short linear motif- 
(SLiM) mediated interfaces within protein-protein inter- 
action (PPI) networks (1). The importance of SLiMs in the 



regulatory and signalling mechanisms of the cell is 
becoming increasingly apparent, as highlighted by their 
use as molecular switches coordinating phase transitions 
in the cell (2) and their increasing association with disease 
(3-5). SLiMs are key components in a wide range of bio- 
logical pathways and are known to act as sites for 
post-translational modifications such as phosphorylation 
or ubiquitination, as targeting signals for particular 
subcellular locations and as ligand-binding sites for 
protein recruitment (6,7). The majority of known motifs 
bind onto the surface of globular domains and exhibit 
specificity for a particular subgroup of a domain family 
(1). SLiMs tend to be just 3-10 amino acids in length with 
only 2-5 residues responsible for the majority of the 
binding affinity and specificity (6). This means that 
discriminating bioinformatically between a stochastic 
match and a result of biological relevance is fraught 
with difficulties (8). 

A number of resources have undertaken the task of 
annotating experimentally validated SLiM classes with 
the most notable examples being the Eukaryotic Linear 
Motif (ELM) (3), MiniMotif (9) and ScanSite (10) data- 
bases. These resources also allow searching of protein 
sequences for novel instances of these annotated classes 
using regular expression patterns or position-specific 
scoring matrices. However, due to the high likelihood of 
motifs occurring in a stochastic manner, the use of pattern 
matching alone produces a large number of false positive 
hits (6). Methods have, therefore, been developed to 
incorporate additional filters based on the attributes of 
SLiMs, including sequence conservation (11-13), struc- 
tural availability (14-16), biophysical feasibility (17) and 
biological keywords (18). Recently, a number of de novo 
motif prediction tools have also emerged, capable of pre- 
dicting new classes of SLiMs (19-22). However, difficulties 
arise in removing the experimental bias towards medically 
relevant proteins as well as biases due to evolutionary 
relationships (12). 

A number of resources have been developed using PPI 
data, to help predict the SLiM functional class associated 
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with a particular protein-binding domain. Dilimot (21) 
and SLiMFinder (19) use the over-representation of 
sequence motifs in proteins, known to interact with a par- 
ticular globular domain, to predict the regular expression 
of the binding SLiM; the ADAN database (23) uses 
high-resolution structures to predict SLiM-mediated inter- 
actions for well-known modular protein domains (SH3, 
SH2, WW, etc). In contrast, NetworKIN (24) employs 
interaction data to predict which kinase is responsible 
for a particular phosphorylation site. The identification 
of SLiM-mediated interactions within PPI data on the 
fly has however, to the best of our knowledge, not been 
investigated. To alleviate this, we introduce the iELM 
server that uses the annotated ELM regular expressions, 
especially trained Hidden Markov Models (HMMs) based 
on the manual annotation of SLiM-binding domains and 
PPI data to identify SLiM-mediated interactions. In 
addition, iELM takes into consideration many of the im- 
portant attributes of SLiMs identified in some of the 
aforementioned studies, including the tendency of SLiMs 
to occur in regions of intrinsic disorder (25) and the pro- 
pensity of functional motifs to be evolutionary conserved 
(6,13). The iELM web server allows the identification of 
SLiM-mediated interactions associated with a protein of 
interest or within a users' PPI network. 



THE iELM ALGORITHM 

The iELM algorithm has been previously described and 
benchmarked (1) and can be summarized as follows: 
iELM assesses binary protein associations for 
SLiM-mediated interactions using the ELM annotated 
regular expressions together with HMMs (26) trained on 
manually annotated SLiM-binding domains and their 
orthologs. The assessment of whether or not a binary 
interaction is SLiM-mediated can be divided into four 
sections. The first module uses the 3DID database (27) 
to check if the two proteins interact via a domain- 
domain interaction. If they do not, the second and third 
parts occur simultaneously to assess each protein for 
SLiM and SLiM-binding domain matches. In the second 
part, SLiMs are identified using the regular expressions 
annotated by the ELM resource, and scored using the 
SLiMSearch algorithm (12) based on the conservation of 
the motif in a multiple sequence alignment of the queried 
protein and its orthologs (12). The predicted SLiM and its 
surrounding amino acids are also assessed by the IUPred 
algorithm (14) for their propensity to be in a region of 
intrinsic disorder (14). In the third part, SLiM-binding 
domains are identified by HMMs trained to recognize 
SLiM-binding domains using the HMMSearch pro- 
gramme (26). An option is also available to search using 
Pfam HMMs (28); however, these domains do not 
take into account the specificity of motifs for subgroups 
of a domain family. If a complementary SLiM and 
SLiM-binding domain partnership exists within the two 
associated proteins, the algorithm uses a cut-off system 
based on the results from the benchmarking data 
sets (see Supplementary Figure SI), as well as recom- 
mendations present in the respective papers (1,12,14). 



The respective cut-offs are 0.3 for disorder scores, 0.6 
for motifs scores and 0.35 for domain scores. Any scores 
below these values will not be returned by the web server. 

Precalculated data 

The calculations by iELM are time-consuming and there- 
fore, to ensure the results from the iELM server are 
returned in a reasonable time, the majority of the data is 
precalculated. The HMMs for SLiM-binding domains (1) 
were used to scan the human UniProt database (29) and 
all hits above a predefined cut-off were recorded. The 
precalculated conservation scores were calculated using 
the SLiMSearch algorithm based on a multiple sequence 
alignment of orthologous proteins identified using the 
Gopher programme (30) from a database of 70 complete 
EnsEMBL proteomes (Ensembl 59) (31). The SLiMSearch 
algorithm used all the SLiM classes annotated within the 
ELM database. Disorder scores for each motif were 
calculated using IUPred. All the protein-protein associ- 
ations annotated within the STRING database (version 

9.0 - STRING score >0.6) (32) were assessed by iELM 
for SLiM-mediated interactions. 

Technical details of the web server 

The web server is built using the Django web framework 
with an underlying PostgreSQL database and is written 
primarily in python. The tables are produced using the 
jQuery library; the graphical displays by the JavaScript 
libraries Raphael and Dracula. The server is HTML 

4.01 compliant and compatible with most commonly 
used web browsers. 



USER INTERFACE 

The iELM web server is freely available at http://i.elm.eu 
.org with no login required. The server aims to provide a 
user-friendly interface for exploring a protein or proteome 
of interest for SLiM-mediated interactions. The server can 
be queried in two ways: 'protein iELM' searches the 
precalculated high-quality associations (score >0.6) from 
the STRING resource for SLiM-mediated interactions, 
whereas 'proteomic iELM' allows users to explore their 
own protein-protein interactome of interest for SLiMs. 
The server also provides a list of all 835 annotated linear 
motif-binding domains that can be freely downloaded at 
http://i. elm. eu.org/domains. 

Protein iELM 

For a single query protein, the 'protein iELM' server 
searches a precalculated database, based on results of 
the iELM algorithm using the high-quality interactions 
from the STRING database (see Figure 1). 

Input 

A single protein ID is required as input, with a drop-down 
menu available to specify the type of sequence ID, which is 
subsequently used to query the ID mapping service 
provided by UniProt. The user can also choose between 
the especially trained iELM HMMs and the Pfam HMMs. 



W366 Nucleic Acids Research, 2012, Vol. 40, Web Server issue 




PROCESSING 







UniProt 





ID mapping 





r 


STRING 





4> 



database of 
precalculated 
data 



OUTPUT 



Python 
Django 
PostGreSQL 



iELM algorithm 







UniProt 






table output 




protein 
k visualisation 




interaction 
network 
visualisation 




Phospho.ELM 



general to both methods 



proteomic iELM specific 



Figure 1. An overview of the iELM server. The iELM server is divided into two sections: 'protein iELM' and 'proteomic iELM', each with different 
inputs. In the flowchart, the yellow coloured arrows are common to both processes whereas the orange arrows and the grey arrows are specific to 
'proteomic iELM' and 'protein iELM', respectively. The processes run by the iELM server can be divided into three sections: the input section at the 
top is displayed with a light blue background, the processing section is displayed in blue and the output section is displayed at the bottom in dark 
blue. The scripting languages and packages used for each section are displayed to the right of the flowchart. 



Upon submitting the job, precalculated data are searched 
ensuring results are returned promptly. 

Output 

The output is divided into a tabular and a graphical 
display: 

• The tabular output (see Figure 2A) consists of the two 
tables: the first table (if applicable) consists of SLiMs 
found within the query protein; the second table (if 
applicable) consists of SLiM-binding domains found 
within the query protein. Both tables are divided 
into three parts: the left part contains the UniProt 



ID of the motif-containing protein, the motif type 
(ELM functional class), the location of the motif, its 
sequence and the associated scores. The central 
portion shows the UniProt ID of the protein contain- 
ing the motif-binding domain, the domain name 
(Pfam) and the domain score. The final part provides 
a link to Pepsite (17), via the 'Structure' button, for a 
structural prediction of the interaction and a biophys- 
ical feasibility assessment (if applicable). The table is 
fully searchable and can be copied to the clipboard, 
printed or downloaded as a comma-separated values 
(CSV) document. 
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Figure 2. Description of iELM outputs. (A) Screenshot of the output from the 'protein iELM' with only the motif table shown. The web server also 
shows an identical domain table (if applicable) describing the interactions of the SLiM-binding domain(s) in the queried protein. The table is divided 
into two sections as displayed in the diagram. Also displayed in the figure, above the table, are the predicted motifs and SLiM-binding domains, as 
well as information from the UniProt and Phospho.ELM resources about the modular architecture of the queried protein. The predicted motifs and 
domains are fully clickable resulting in the sorting of the table whereas the annotated domains and phosphorylation sites link out to their respective 
resources. (B) Screenshot of the network diagram displayed as an output for 'proteomic iELM'. The colour of the edges designates the type of ELM 
class associated with the interaction. The colours of the nodes represent whether the protein contains a motif (yellow), a SLiM-binding domain 
(green) or both a SLiM-binding domain and a motif (diagonal partition with both colours). Network diagrams specific for each interaction can be 
produced by clicking the 'Interaction' button in the table displayed in the output section of 'proteomic iELM'. 
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• The graphical output (see Figure 2A) displays a repre- 
sentation of the predicted SLiMs and SLiM-binding 
domains along with the modular architecture of the 
query protein extracted from the UniProt database 
and the annotated phosphorylation sites from 
Phospho.ELM (33). The modular architecture pre- 
dicted by the iELM method is divided by colour into 
SLiM functional types, as classified by ELM (Ligand, 
Targeting, Cleavage and Modification), with the 
annotated instances from ELM outlined in green. A 
key describing these types is fully clickable enabling 
the filtering of both the graphical and tabular 
content. Tool tips are also integrated to allow the 
user to gain immediate information on individual 
SLiMs, SLiM-binding domains and UniProt 
domains, as well as to help the user interpret the 
output. The annotated domains are linked to the 
UniProt database and the individual predicted motifs 
are also clickable resulting in the filtering of the 
tabular content. 



Proteomic iELM 

This section allows the user to input an individualised PPI 
network that is searched using the iELM algorithm (see 
Figure 1). 

Input 

The user may submit either a tabulated list of interactions 
or a list of IDs that will be searched in an all-against-all 
manner. Once again, a drop-down menu is available to 
specify the type of ID that the user wishes to input and 
for the type of HMMs the user wishes to use. There is a 
limit of 75 000 interactions for a tabulated list and 400 IDs 
for an all-against-all search. Upon submitting the job, the 
user is redirected to a wait page while the results are 
calculated. The waiting time is normally less than 5 min. 

Output 

As with the iELM section, the output is divided into two 
sections: 

• The tabular output is of the same structure as 
described in 'protein iELM' (see Figure 2A), except 
only one table is displayed containing all the inter- 
actions, the originally queried protein is displayed 
next to the converted UniProt ID and there is an add- 
itional button called 'Interaction'. Clicking on this 
button leads to the production of a graphical repre- 
sentation of the PPIs linked to this interaction. If there 
are associations that are not predicted to be 
SLiM-mediated, an additional table is displayed in 
the left-hand column for users' inspection. In the 
same column, if any of the IDs submitted fail to be 
converted, a link is displayed that connects to a page 
displaying these proteins. 

• The graphical output contains the modular architec- 
ture as outlined above, as well as a network of all 
the connecting interactions in one connected cluster 
of up to 75 proteins (Figure 2B). On the initial 



production of the results page, a network is displayed 
based on the best scoring SLiM-mediated interaction; 
pressing the aforementioned 'Interaction' button in the 
table can alter this. The edges of the network are 
coloured depending on the type of interaction (ELM 
type) and the nodes are coloured depending on 
whether they contain a SLiM, a SLiM-binding 
domain or both. Clicking on the 'Interaction' button 
also reveals the globular architecture of the interacting 
proteins of interest (as described in 'Protein iELM' 
Section). 



FUTURE WORK 

Currently, only the human proteome is fully searchable, 
however, in the near future we plan to include additional 
model organisms. We also wish to incorporate an add- 
itional section that will allow users to search PPIs with 
their own regular expression and SLiM-binding 
domains. We will update iELM regularly to ensure 
newly annotated binding domains are incorporated into 
the precalculated data. To further facilitate our annota- 
tion process, we have included a form in the domains 
section, which allows users to inform us of known linear 
motif-binding domains that are not presently annotated in 
iELM. 



CONCLUSIONS 

The iELM web server is, to the best of our knowledge, the 
first algorithm that facilitates the exploration and identi- 
fication of SLiM-mediated interactions within PPI 
networks on the fly. The user-friendly platform allows 
enquiries at the single protein level as well as within 
large-scale proteomic studies. The iELM resource can, 
therefore, be useful in guiding experimental studies and 
facilitating the analysis of pathways within PPI 
networks. To accommodate a wide range of users, the 
server supports multiple database types as input for- 
mat and allows the download of results as easily 
parsable CSV data file. The web server is freely available 
at http://i. elm. eu.org. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Figure 1. 
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